Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

نویسندگان

  • Jaeyoung Choi
  • Jack J. Dongarra
  • Susan Ostrouchov
  • Antoine Petitet
  • David W. Walker
  • R. Clinton Whaley
چکیده

This paper discusses the core factorization routines included in the ScaLAPACK library. These routines allow the factorization and solution of a dense system of linear equations via LU, QR, and Cholesky. They are implemented using a block cyclic data distribution, and are built using de facto standard kernels for matrix and vector operations (BLAS and its parallel counterpart PBLAS) and message passing communication (BLACS). In implementing the ScaLAPACK routines, a major objective was to parallelize the corresponding sequential LAPACK using the BLAS, BLACS, and PBLAS as building blocks, leading to straightforward parallel implementations without a signi cant loss in performance. We present the details of the implementation of the ScaLAPACK factorization routines, as well as performance and scalability results on the Intel iPSC/860, Intel Touchstone Delta, and Intel Paragon systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines

This paper describes the design and implementation of three core factorization routines—LU, QR, and Cholesky—included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer sub-matrice panels into memory. The ‘l...

متن کامل

The Design and Implementation of the Parallel Out - of - coreScaLAPACK

This paper describes the design and implementation of three core factorization routines | LU, QR and Cholesky | included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to t entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices into mem...

متن کامل

SCALABILITY ISSUES AFFECTING THE DESIGN OFA DENSE LINEAR ALGEBRA LIBRARYJack

This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run eeciently on scalable concurrent computers. To ensure good scalability and performance, the ScaLAPACK routines are based on block-partitioned...

متن کامل

PoLAPACK: parallel factorization routines with algorithmic blocking

LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Most of these factorization routines are implemented with blockpartitioned algorithms in order to perform matrix-matrix operations, that is, to obtain the highest performance by maximizing reuse of...

متن کامل

Implementing a parallel matrix factorization library on the cell broadband engine

Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR, and SVD on the STI Cell broadband engine. The paper explores algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Scientific Programming

دوره 5  شماره 

صفحات  -

تاریخ انتشار 1996